Method and apparatus for adapting a class entity dictionary used with language models

ABSTRACT

A method and apparatus are provided for augmenting a language model with a class entity dictionary based on corrections made by a user. Under the method and apparatus, a user corrects an output that is based in part on the language model by replacing an output segment with a correct segment. The correct segment is added to a class of segments in the class entity dictionary and a probability of the correct segment given the class is estimated based on an n-gram probability associated with the output segment and an n-gram probability associated with the class. This estimated probability is then used to generate further outputs.

BACKGROUND OF THE INVENTION

[0001] The present invention relates to language models. In particular,the present invention relates to adapting language models based on userinput.

[0002] Language models provide a measure of the likelihood of a seriesof words appearing in a string of text. Such models are used in speechrecognition, Chinese word segmentation, and phonetic-to-characterconversion, such as pinyin-to-hanzi conversion in Chinese, toidentifying a most likely sequence of words given a lattice of possiblesequences. For example, in speech recognition, a language model wouldidentify the phrase “go to bed” as being more likely than thephonetically similar phrase “go too bed”.

[0003] Typically, language models are trained on a corpus of sentences.Although such corpora are effective for training language models tohandle general words, they are not very effective for training languagemodels to handle proper nouns such as the names of people andbusinesses. The reason for this is that proper names do not occur withenough frequency in a corpus to be accurately modeled.

[0004] Some systems allow users to correct mistakes made by the languagemodel. However, even after a system knows about the correction, there isno way for the system to adjust the language model based on thecorrection because there is no way to assess the probability of the wordsequence formed by the correction. Because of this, the system willgenerally make the same mistake later when it encounters the same input.

[0005] Thus, a system is needed that allows a language model and adynamic dictionary to be modified based on corrections made by a user.

SUMMARY OF THE INVENTION

[0006] A method and apparatus are provided for augmenting a languagemodel with a class entity dictionary based on corrections made by auser. Under the method and apparatus, a user corrects an output that isbased in part on the language model by replacing an output segment witha correct segment. The correct segment is added to a class of segmentsin the class entity dictionary and a probability of the correct segmentgiven the class is estimated based on an n-gram probability associatedwith the output segment and an n-gram probability associated with theclass. This estimated probability is then used to generate furtheroutputs.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007]FIG. 1 is a block diagram of one computing environment in whichthe present invention may be practiced.

[0008]FIG. 2 is a block diagram of an alternative computing environmentin which the present invention may be practiced.

[0009]FIG. 3 is a flow diagram for updating a class entity dictionaryunder one embodiment of the present invention.

[0010]FIG. 4 is a block diagram of a pinyin-to-character conversionembodiment of the present invention.

[0011]FIG. 5 is a flow diagram for utilizing a class entity dictionaryunder one embodiment of the present invention.

[0012]FIG. 6 is a block diagram of a pattern recognition systemembodiment of the present invention.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

[0013]FIG. 1 illustrates an example of a suitable computing systemenvironment 100 on which the invention may be implemented. The computingsystem environment 100 is only one example of a suitable computingenvironment and is not intended to suggest any limitation as to thescope of use or functionality of the invention. Neither should thecomputing environment 100 be interpreted as having any dependency orrequirement relating to any one or combination of components illustratedin the exemplary operating environment 100.

[0014] The invention is operational with numerous other general purposeor special purpose computing system environments or configurations.Examples of well known computing systems, environments, and/orconfigurations that may be suitable for use with the invention include,but are not limited to, personal computers, server computers, hand-heldor laptop devices, multiprocessor systems, microprocessor-based systems,set top boxes, programmable consumer electronics, network PCs,minicomputers, mainframe computers, telephony systems, distributedcomputing environments that include any of the above systems or devices,and the like.

[0015] The invention may be described in the general context ofcomputer-executable instructions, such as program modules, beingexecuted by a computer. Generally, program modules include routines,programs, objects, components, data structures, etc. that performparticular tasks or implement particular abstract data types. Theinvention may also be practiced in distributed computing environmentswhere tasks are performed by remote processing devices that are linkedthrough a communications network. In a distributed computingenvironment, program modules may be located in both local and remotecomputer storage media including memory storage devices.

[0016] With reference to FIG. 1, an exemplary system for implementingthe invention includes a general purpose computing device in the form ofa computer 110. Components of computer 110 may include, but are notlimited to, a processing unit 120, a system memory 130, and a system bus121 that couples various system components including the system memoryto the processing unit 120. The system bus 121 may be any of severaltypes of bus structures including a memory bus or memory controller, aperipheral bus, and a local bus using any of a variety of busarchitectures. By way of example, and not limitation, such architecturesinclude Industry Standard Architecture (ISA) bus, Micro ChannelArchitecture (MCA) bus, Enhanced ISA (EISA) bus, Video ElectronicsStandards Association (VESA) local bus, and Peripheral ComponentInterconnect (PCI) bus also known as Mezzanine bus.

[0017] Computer 110 typically includes a variety of computer readablemedia. Computer readable media can be any available media that can beaccessed by computer 110 and includes both volatile and nonvolatilemedia, removable and non-removable media. By way of example, and notlimitation, computer readable media may comprise computer storage mediaand communication media. Computer storage media includes both volatileand nonvolatile, removable and non-removable media implemented in anymethod or technology for storage of information such as computerreadable instructions, data structures, program modules or other data.Computer storage media includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical disk storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other medium which can be used to store the desired informationand which can be accessed by computer 110. Communication media typicallyembodies computer readable instructions, data structures, programmodules or other data in a modulated data signal such as a carrier waveor other transport mechanism and includes any information deliverymedia. The term “modulated data signal” means a signal that has one ormore of its characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared and other wireless media. Combinations of any of the aboveshould also be included within the scope of computer readable media.

[0018] The system memory 130 includes computer storage media in the formof volatile and/or nonvolatile memory such as read only memory (ROM) 131and random access memory (RAM) 132. A basic input/output system 133(BIOS), containing the basic routines that help to transfer informationbetween elements within computer 110, such as during start-up, istypically stored in ROM 131. RAM 132 typically contains data and/orprogram modules that are immediately accessible to and/or presentlybeing operated on by processing unit 120. By way of example, and notlimitation, FIG. 1 illustrates operating system 134, applicationprograms 135, other program modules 136, and program data 137.

[0019] The computer 110 may also include other removable/non-removablevolatile/nonvolatile computer storage media. By way of example only,FIG. 1 illustrates a hard disk drive 141 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 151that reads from or writes to a removable, nonvolatile magnetic disk 152,and an optical disk drive 155 that reads from or writes to a removable,nonvolatile optical disk 156 such as a CD ROM or other optical media.Other removable/non-removable, volatile/nonvolatile computer storagemedia that can be used in the exemplary operating environment include,but are not limited to, magnetic tape cassettes, flash memory cards,digital versatile disks, digital video tape, solid state RAM, solidstate ROM, and the like. The hard disk drive 141 is typically connectedto the system bus 121 through a non-removable memory interface such asinterface 140, and magnetic disk drive 151 and optical disk drive 155are typically connected to the system bus 121 by a removable memoryinterface, such as interface 150.

[0020] The drives and their associated computer storage media discussedabove and illustrated in FIG. 1, provide storage of computer readableinstructions, data structures, program modules and other data for thecomputer 110. In FIG. 1, for example, hard disk drive 141 is illustratedas storing operating system 144, application programs 145, other programmodules 146, and program data 147. Note that these components can eitherbe the same as or different from operating system 134, applicationprograms 135, other program modules 136, and program data 137. Operatingsystem 144, application programs 145, other program modules 146, andprogram data 147 are given different numbers here to illustrate that, ata minimum, they are different copies.

[0021] A user may enter commands and information into the computer 110through input devices such as a keyboard 162, a microphone 163, and apointing device 161, such as a mouse, trackball or touch pad. Otherinput devices (not shown) may include a joystick, game pad, satellitedish, scanner, or the like. These and other input devices are oftenconnected to the processing unit 120 through a user input interface 160that is coupled to the system bus, but may be connected by otherinterface and bus structures, such as a parallel port, game port or auniversal serial bus (USB). A monitor 191 or other type of displaydevice is also connected to the system bus 121 via an interface, such asa video interface 190. In addition to the monitor, computers may alsoinclude other peripheral output devices such as speakers 197 and printer196, which may be connected through an output peripheral interface 190.

[0022] The computer 110 may operate in a networked environment usinglogical connections to one or more remote computers, such as a remotecomputer 180. The remote computer 180 may be a personal computer, ahand-held device, a server, a router, a network PC, a peer device orother common network node, and typically includes many or all of theelements described above relative to the computer 110. The logicalconnections depicted in FIG. 1 include a local area network (LAN) 171and a wide area network (WAN) 173, but may also include other networks.Such networking environments are commonplace in offices, enterprise-widecomputer networks, intranets and the Internet.

[0023] When used in a LAN networking environment, the computer 110 isconnected to the LAN 171 through a network interface or adapter 170.When used in a WAN networking environment, the computer 110 typicallyincludes a modem 172 or other means for establishing communications overthe WAN 173, such as the Internet. The modem 172, which may be internalor external, may be connected to the system bus 121 via the user inputinterface 160, or other appropriate mechanism. In a networkedenvironment, program modules depicted relative to the computer 110, orportions thereof, may be stored in the remote memory storage device. Byway of example, and not limitation, FIG. 1 illustrates remoteapplication programs 185 as residing on remote computer 180. It will beappreciated that the network connections shown are exemplary and othermeans of establishing a communications link between the computers may beused.

[0024]FIG. 2 is a block diagram of a mobile device 200, which is analternative exemplary computing environment. Mobile device 200 includesa microprocessor 202, memory 204, input/output (I/O) components 206, anda communication interface 208 for communicating with remote computers orother mobile devices. In one embodiment, the afore-mentioned componentsare coupled for communication with one another over a suitable bus 210.

[0025] Memory 204 is implemented as non-volatile electronic memory suchas random access memory (RAM) with a battery back-up module (not shown)such that information stored in memory 204 is not lost when the generalpower to mobile device 200 is shut down. A portion of memory 204 ispreferably allocated as addressable memory for program execution, whileanother portion of memory 204 is preferably used for storage, such as tosimulate storage on a disk drive.

[0026] Memory 204 includes an operating system 212, application programs214 as well as an object store 216. During operation, operating system212 is preferably executed by processor 202 from memory 204. Operatingsystem 212, in one preferred embodiment, is a WINDOWS® CE brandoperating system commercially available from Microsoft Corporation.Operating system 212 is preferably designed for mobile devices, andimplements database features that can be utilized by applications 214through a set of exposed application programming interfaces and methods.The objects in object store 216 are maintained by applications 214 andoperating system 212, at least partially in response to calls to theexposed application programming interfaces and methods.

[0027] Communication interface 208 represents numerous devices andtechnologies that allow mobile device 200 to send and receiveinformation. The devices include wired and wireless modems, satellitereceivers and broadcast tuners to name a few. Mobile device 200 can alsobe directly connected to a computer to exchange data therewith. In suchcases, communication interface 208 can be an infrared transceiver or aserial or parallel communication connection, all of which are capable oftransmitting streaming information.

[0028] Input/output components 206 include a variety of input devicessuch as a touch-sensitive screen, buttons, rollers, and a microphone aswell as a variety of output devices including an audio generator, avibrating device, and a display. The devices listed above are by way ofexample and need not all be present on mobile device 200. In addition,other input/output devices may be attached to or found with mobiledevice 200 within the scope of the present invention.

[0029] The present invention provides a means for using and adapting astatistical language model and a class-based dictionary in variousapplications. A statistical language model provides the likelihood thata sequence of words will appear in a language. In general, an n-gramlanguage model defines the probability of a sequence of words as:

Pr(H)=Pr(w ₁)*Pr(w ₂ |w ₁)* . . . *Pr(w _(i) |w _(i−(n−1)) , . . . ,w_(i−1))* . . . *Pr(w _(t) |w _(t−(n−1)) , . . . ,w _(t−1))   EQ. 1

[0030] where H is a sequence of words w₁,w₂, . . . , w_(t), t is thenumber of word in the sequence, n−1 is the number of past words that areused to predict the next word, and Pr(w_(i)|w_(i−(n−1)), . . . ,w_(i−1))is the probability of the ith word given the n−1 preceding words. Thusin a bigram language model, n=2, and in a trigram language model n=3.

[0031] One problem with statistical language models is that they do notprovide accurate probabilities for unknown or rarely used words such asproper nouns. To overcome this, the present invention utilizes aclass-based language model.

[0032] In the class-based language model of the present invention, themodel predicts the probability of sequences of classes of words andindividual words. To do this, equation 1 is modified when a class isused in place of one or more words. For example, for a trigram languagemodel, the probabilities calculated in connection with a class token Nare:

Pr(H)= . . . *Pr(w _(i−1) |w _(i−3) ,w _(i−2))*Pr(T _(i) |N _(i))*Pr(N_(i) |w _(i−2) ,w _(i−1))*Pr(w _(i+1) |w _(i−1) ,N _(i))* . . . Pr(w_(i+2) |N _(i) , w _(i+1))* . . .   EQ. 2

[0033] where only those probabilities near the class probabilities areshown in equation 2 for simplicity, N_(i) is the class at the ithposition in the sequence, T_(i) is an entity in class N, andPr(T_(i)|N_(i)) is referred to as an inside probability that providesthe probability of entity T given class N. Under one embodiment, theinside probability is provided by a class entity dictionary that definesthe words found in each class. Under one embodiment, the class entitydictionary provides a set of grammar rules that define known words thatare found in particular classes. For example, a context-free grammar forthe class [NAME] may include the rule “Hank Hanson”. The outsideprobabilities (the probability of a class given previous words) isprovided by the class-based language model.

[0034] A class-based language model helps to overcome the sparsenessproblem associated with certain classes of words such as proper nouns.Such words appear so infrequently in the training data that a languagemodel that does not use classes will always prefer more common wordsover the infrequently used words. By using classes, the language modelof the present invention increases the likelihood of a class beingidentified since a class of words occurs more frequently in the trainingdata than an individual word of the class.

[0035] Before a class-based language model or class entity dictionarycan be used, they must be trained. Under one embodiment, the class-basedlanguage model and the class entity dictionary are initially trained byfirst tagging a training corpus to identify words that fall withinclasses based on a set of heuristics. The heuristics provide a set ofrules that predict the location of a class of words based on other wordsin the input. For example, if the verb “call” is a possible word in theinput, the heuristic rules may indicate that the next word or next twowords after “call” should be considered part of the [NAME] class. (Forexample, “Call Jack Jones”).

[0036] The words that are identified using the heuristics are replacedwith their class and the class-based language model is then trainedusing standard training techniques on the words and classes in thecorpus.

[0037] The class entity dictionary is initially trained by dividing thewords identified for each class into sub-components. Thesesub-components are then used in a standard n-gram training technique toidentify probabilities for the words given the class. Such probabilitiesform the inside probabilities for the class.

[0038] An additional aspect of the present invention provides forupdating and expanding the class entity dictionary and the class-basedlanguage model based on input provided by the user. In particular, theclass entity dictionary is expanded when a user changes a decodedsequence of words so that the modified sequence includes a word that isnot in the class entity dictionary. For example, if the class-basedlanguage model and class entity dictionary decode the sequence of words“write a letter to Phil” and the user changes the sequence to “write aletter to Bill”, “Bill” will be added to the class-entity dictionary ifit was not previously in the class-entity dictionary.

[0039] In order to add an entity to the class-entity dictionary, aninside probability for the entity must determined. As noted above, theinside probability provides the probability of an entity given a class.This probability cannot be calculated exactly because there is notenough data to establish the likelihood of the entity given the class.Instead, under embodiments of the present invention, this insideprobability is estimated by assuming that, at a minimum, the product ofthe inside probability for the entity and the language model probabilityfor the class should be equal to the language model probability for theword that was incorrectly identified by the decoder.

[0040] In terms of an equation for a trigram language model, thisassumption reads as:

Pr(T _(i) |N _(i))*Pr(N _(i) |w _(i−2) ,w _(i−1))=Pr(p _(i) |w _(i−2) ,w_(i−1))   EQ. 3

[0041] where Pr(T_(i)|N_(i)) is the inside probability of the modifiedentity T_(i) given the class N_(i), Pr(N_(i)|w_(i−2),w_(l−1)) is thelanguage model probability for class N_(i) given the two preceding wordsin the sequence, and Pr(p_(i)|w_(i−2),w_(i−1)) is the language modelprobability for the incorrect entity p_(i) that was decoded and latermodified to form the modified entity T_(i).

[0042] Using this assumption, the inside probability is then estimatedas: $\begin{matrix}{{\Pr \left( {T_{i}N_{i}} \right)} = \frac{\Pr \left( {{p_{i}w_{i - 2}},w_{i - 1}} \right)}{\Pr \left( {{N_{i}w_{i - 2}},w_{i - 1}} \right)}} & {{EQ}.\quad 4}\end{matrix}$

[0043] However, this estimate is highly dependent on the preceding wordsin the sequence. To lower this dependence and thus make the estimatemore general, the probability is re-written as: $\begin{matrix}{{\Pr \left( {T_{i}N_{i}} \right)} = \frac{\Pr \left( {{p_{i}{\text{<}{unknown}\text{>}_{i - 2}}},{\text{<}{unknown}\text{>}_{i - 1}}} \right)}{\Pr \left( {{N_{i}{\text{<}{unknown}\text{>}_{i - 2}}},{\text{<}{unknown}\text{>}_{i - 1}}} \right)}} & {{EQ}.\quad 5}\end{matrix}$

[0044] where Pr(p_(i)|<unknown>_(i−2),<unknown>_(i−1)) represents theprobability of p_(i) given any two preceding words andPr(N_(i)|<unknown>_(i−2),<unknown>_(i−1)) represents the probability ofclass N_(i) given any two preceding words. Note thatPr(p_(i)|<unknown>_(i−2), <unknown>_(i−1)) and Pr(N_(i)|<unknown>_(i−2),<unknown>_(i−1)) are stored in the language model during training byreplacing preceding words with the <unknown> tokens and determining theprobability of p_(i) and N_(i) given the <unknown> tokens.

[0045] Once the probability has been estimated for the modified entity,the modified entity and the estimated inside probability are added tothe class entity dictionary under the appropriate class.

[0046] User modifications to the decoded sequence of words do not alwaysinvolve words that were not present in the class-entity dictionary.Instead, either the original decoded word or the modified word may havebeen present in the class-entity dictionary. FIG. 3 provides a flowdiagram of the steps used to determine how to alter the class entitydictionary based on user modifications.

[0047] As an overview, the process of FIG. 3 can adjust the insideprobabilities in three ways. For a modified word that was already in theclass entity dictionary, the fact that the word was not decodedindicates that its inside probability is too low. As such, itsprobability must be increased. For a decoded word that is in the classentity dictionary, the fact that the user modified the word indicatesthat the decoded word's inside probability is too high. As such, itsprobability must be decreased. For a modified word that is not in thedictionary, the modified word must be added to the dictionary and itsinitial probability calculated using Equation 5 above.

[0048] To determine which adjustment to make, the process of FIG. 3begins at step 300 where the sequence of words produced by the usermodification is examined to determine if the modified word is in theclass entity dictionary. If the modified word is in the class entitydictionary, a determination is made as to whether the modified words arefound in only a single class at step 320.

[0049] If the modified words are found in more than one class, theclass-based language model is used to select the most likely class byusing each of the possible classes in a separate sequence andidentifying the sequence that provides the highest likelihood. This isshown as step 322 in FIG. 3.

[0050] If the modified words are only found in a single class in step320 or after a single class has been identified at step 322, the insideprobability for the modified characters needs to be adjusted becauseeven though the modified words were in the class entity dictionary, thedecoder did not identify them from the input because their insideprobability was too low. To correct this, the inside probability storedin the class entity dictionary for the modified characters is increasedat step 324. Under some embodiments, the inside probability is increasedby multiplying it by a factor of 1.5.

[0051] If the modified characters are not in the class entity dictionaryat step 300, a set of heuristics is used at step 302 to determinepossible classes for the modified characters. Each of these classes isthen used to build a separate sequence or words with the other decodedwords. The class-based language model is then used to identify the mostlikely sequence and thus the most-likely class for the modified word.

[0052] If a class can be identified for the modified word at step 304,an inside probability for the modified word is determined using equation5 above at step 308 and the modified word and probability are added tothe class entity dictionary at step 310.

[0053] If a class cannot be identified for the modified word at step304, the word that was decoded and modified by the user is examined atstep 312 to determine if the decoded word is in the class entitydictionary. If the decoded word is in the dictionary at step 312, thefact that the decoded word was identified instead of the modified wordmeans that the inside probability for the decoded word is set too high.To correct this, the inside probability for the decoded words isdecreased at step 314. Under many embodiments, the inside probability isreduced by a factor of 1.5. (In other words, the inside probability isdivided by 1.5 to form the new probability).

[0054] If the decoded word is not in the class entity dictionary at step312, no changes need to be made to the class entity dictionary sinceneither the decoded nor the modified word falls within a class. As such,the class entity dictionary is left unchanged at step 318.

[0055] The class-based language model and the method of updating aclass-based language model under the present invention may be used inmany systems. For example, FIG. 4 provides a block diagram of aphonetic-to-character conversion system 400 that can be implemented inthe environments of FIGS. 1 and 2 and that utilizes an embodiment of thepresent invention. The operation of this system is shown in the flowdiagram of FIG. 5.

[0056] At step 500 of FIG. 5, phonetic input 402, which is the phoneticdescription of characters found in a character-based language such asChinese, Japanese, or Korean, is provided to a decoder 404. In Chinese,one embodiment of the phonetic input is pinyin input. At step 502,decoder 404 first builds a lattice of possible words that can berepresented by the phonetic input using a lexicon 406. The lattice isthen expanded at step 504 by identifying class entities from the wordsin the lattice using class entity dictionary 412 and heuristic rules416. The identified classes are added as separate nodes in the lattice.

[0057] At step 506, decoder 404 determines a probability for each paththrough the lattice using a phonetic model 408, which provides theprobability that each word along the path will represent a phoneticsegment, the class entity dictionary, which provides the insideprobability for the classes, a language model 310, which provides theprobability of a sequence of words and/or classes occurring in alanguage and equation 2 above. The sequence of words along the path thatprovides the highest probability is then output as the decoded string ofwords at step 508.

[0058] After the decoded sequence has been provided to the user, thesystem can receive user modifications 420 at step 510. This modificationindicates the correct words that the user intended by their input. Atstep 512, this user modification is examined to determine how it shouldbe used to alter the class entity dictionary using the process of FIG.3. In particular, class extraction unit 422 uses heuristics 416 andclass entity dictionary 412 to identify a class for the modified wordand to determine if the decoded word or the modified word is in theclass entity dictionary. A probability determination unit 424 thencalculates a probability for the modified word if it was not present inthe dictionary or determines a new probability for the modified word orthe decoded word to improve the performance of the decoder as indicatedabove in FIG. 3.

[0059] In a second embodiment, the class-based language model of thepresent invention is used in a speech recognition system such as thespeech recognition system of FIG. 6. In FIG. 6, an input speech signalfrom a speaker 600 and additive noise 602 are converted into anelectrical signal by a microphone 604, which is connected to ananalog-to-digital (A-to-D) converter 606.

[0060] A-to-D converter 606 converts the analog signal from microphone604 into a series of digital values. In several embodiments, A-to-Dconverter 606 samples the analog signal at 16 kHz and 16 bits persample, thereby creating 32 kilobytes of speech data per second.

[0061] The digital data created by A-to-D converter 606 is provided toan optional noise reduction module 608, which removes some of the noisein the digital signal using one or more noise reduction techniques.

[0062] The output of noise reduction module 608 is provided to a featureextractor 600, which extracts a feature from the digital speech signal.Examples of feature extraction modules include modules for performingLinear Predictive Coding (LPC), LPC derived cepstrum, Perceptive LinearPrediction (PLP), Auditory model feature extraction, and Mel-FrequencyCepstrum Coefficients (MFCC) feature extraction. Note that the inventionis not limited to these feature extraction modules and that othermodules may be used within the context of the present invention.

[0063] The feature extraction module receives the stream of digitalvalues from noise reduction module 608 and produces a stream of featurevectors that are each associated with a frame of the speech signal. Inmany embodiments, the centers of the frames are separated by 10milliseconds.

[0064] Note that although noise reduction module 608 is shown beforefeature extractor 600 in the embodiment of FIG. 6, in other embodiments,noise reduction module 608 appears after feature extractor 600.

[0065] The stream of feature vectors produced by the extraction moduleis provided to a decoder 612, which identifies a most likely sequence ofwords based on the stream of feature vectors, a lexicon 614, a languagemodel 616, an acoustic model 618, heuristic rules 622 and a class entitydictionary 620.

[0066] Acoustic model 618 provides a probability that an input featurevector was created by the pronunciation of a linguistic unit such as asenone, phoneme, diphone, or triphone.

[0067] Language model 616, class entity dictionary 620 and heuristicrules 622 are used by decoder 612 in a manner similar to the way decoder404 uses language model 410, class entity dictionary 412, and heuristicrules 416.

[0068] Based on the acoustic model, the language model, the lexicon, theclass entity dictionary, and the heuristic rules, decoder 612 identifiesa most likely sequence of words from all possible word sequences. Inparticular, decoder 612 uses steps 500, 502, 504, 506, and 508 of FIG. 5to identify the most likely word sequence.

[0069] The most probable word sequence is then subjected to possibleuser modification 630. If the user modifies words in the decodedsequence, the modified words are provided to a class extraction unit 632and a probability determination unit 634, which operate in a mannersimilar to class extraction 422 and probability determination unit 424of FIG. 4. Using the process of FIG. 3, the class entity dictionary 620is then modified based on the user modifications of the decoded words.

[0070] Although the present invention has been described with referenceto particular embodiments, workers skilled in the art will recognizethat changes may be made in form and detail without departing from thespirit and scope of the invention.

What is claimed is:
 1. A method of decoding input, the methodcomprising: identifying possible sequences of words from the input;using a class-based language model and a class entity dictionary toselect one of the possible sequences of words as an output sequence;receiving modifications made to the output sequence; and using themodifications to change the class entity dictionary.
 2. The method ofclaim 1 wherein using the modifications to change the class entitydictionary comprises using the modifications to add an entity to theclass entity dictionary.
 3. The method of claim 2 wherein adding anentity to the class entity dictionary comprises adding an entity to aclass in the class entity dictionary.
 4. The method of claim 3 whereinadding an entity further comprises estimating a probability for theadded entity given the class to which the entity is added.
 5. The methodof claim 4 wherein receiving a modification comprises receiving amodified entity that represents a modification of a decoded entity inthe output sequence and wherein adding an entity comprises adding themodified entity.
 6. The method of claim 5 wherein estimating aprobability for the entity comprises estimating a probability based inpart on a probability associated with the decoded entity.
 7. The methodof claim 6 wherein estimating a probability for the entity comprisesestimating the probability based on an n-gram probability associatedwith the decoded entity and an n-gram probability associated with theclass to which the modified entity is added.
 8. The method of claim 1wherein using the modifications to change the class entity dictionarycomprises increasing a probability associated with an entity in theclass entity dictionary.
 9. The method of claim 8 wherein receivingmodifications comprises receiving a modified entity that represents amodification of a decoded entity in the output sequence and wherein themodified entity is found in the class entity dictionary.
 10. The methodof claim 1 wherein using the modifications to change the class entitydictionary comprises decreasing a probability associated with an entityin the class entity dictionary.
 11. The method of claim 10 whereinreceiving modifications comprises receiving a modified entity thatrepresents a modification of a decoded entity in the output sequence andwherein the modified entity is not found in the class entity dictionarybut the decoded entity is found in the class entity dictionary.
 12. Themethod of claim 11 wherein decreasing the probability of an entitycomprises decreasing the probability of the decoded entity.
 13. Acomputer-readable medium having computer-executable instructions forperforming steps comprising: generating a sequence of words based inpart on a class entity dictionary that provides probabilities forentities in at least one class; receiving a modification to the sequenceof words such that a decoded entity in the sequence of words is modifiedinto a modified entity; and setting a probability of an entity in theclass entity dictionary based at least in part on at least one of thedecoded entity and the modified entity.
 14. The computer-readable mediumof claim 13 wherein setting a probability of an entity in the classentity dictionary comprises adding the modified entity to the classentity dictionary and selecting a probability for the modified entity.15. The computer-readable medium of claim 14 wherein selecting aprobability for the modified entity comprises estimating the probabilitybased on a probability associated with the decoded entity.
 16. Thecomputer-readable medium of claim 15 wherein estimating the probabilityfurther comprises estimating the probability based on a probabilityassociated with a class in the class entity dictionary.
 17. Thecomputer-readable medium of claim 16 wherein estimating the probabilitycomprises estimating the probability based on an n-gram probabilityassociated with the decoded entity and an n-gram probability associatedwith the class.
 18. The computer-readable medium of claim 13 whereinsetting a probability of an entity comprises increasing the probabilityof an entity.
 19. The computer-readable medium of claim 18 whereinsetting a probability further comprises: determining that the modifiedentity is in the class entity dictionary; and increasing the probabilityof the modified entity.
 20. The computer-readable medium of claim 13wherein setting the probability of an entity comprises decreasing theprobability of an entity.
 21. The computer-readable medium of claim 20wherein setting a probability further comprises: determining that thedecoded entity is in the class entity dictionary; and decreasing theprobability of the decoded entity.
 22. A method of adapting a classentity dictionary used with a class-based language model, the methodcomprising: receiving a user modification of a sequence of words thatwere identified based in part on the class-based language model;identifying a decoded segment that has been modified to become amodified segment in the user modification; and determining a probabilityfor the modified segment based in part on the decoded segment.